On September 26, 2016 at 11:47 a.m. U.S. Central Daylight Time (16:47 UTC) the Cedar and Wapsipinicon rivers in Iowa surged producing a flood wave that breached the river banks. The water level of the Cedar River measured ~20 feet — 8 feet above flood stage—near the city of Cedar Rapids.
The water level continued to rise until it peaked at ~22 feet on September 27. This event had only been exceeded once, in June 2008, when thousands of people were encouraged to evacuate from Cedar Rapids, the second-most-populous city in Iowa.
In this lab we are interested in the impacts in Palo Iowa because it is up stream of Cedar Rapids, contains a large amount of farm land, and does not have a forecast location to provide warning.
We will use the raster package and our understanding of raster data and categorization to create flood images using mutliband Landsat Imagery, thresholding, and classification methods.
library(raster) # Raster Data handling
library(tidyverse) # Data Manipulation
library(getlandsat) # keyless Landsat data (2013-2017)
library(osmdata)
library(sf) # Vector data processing
library(mapview) # Rapid Interactive visualization
Remote Sensing/Image Analysis begins with usually the same steps:
Using uscities.csv, filter to Palo, Iowa and create a 5 kilometer buffer.
From the Palo, Iowa feature:
This region defines the AOI for this analysis.
bb = read.csv('../data/uscities.csv') %>%
filter(city == "Palo") %>%
st_as_sf(coords = c("lng", "lat"), crs = 4326) %>%
st_transform(5070) %>%
st_buffer(5000) %>%
st_bbox() %>%
st_as_sfc() %>%
st_as_sf()
For our analysis we will be using data from Landsat 8. Landsat 8 is the newest generation of the Landsat satellites and provides a useful resource for water detection. The OLI sensor aboard Landsat 8 has nine bands for capturing the spectral response of the earth’s surface at discrete wavelengths along the electromagnetic spectrum.
Amazon hosts all Landsat 8 data in a bucket associated with the OpenData Registry Initiative
The getlandsat R package provides a nice interface to this product for images taken between 2013-2017. Current efforts to extend this resource are underway using the SAT-API.
To find our images, we first need to list all available scenes, filter to those that meet our criteria (date and bounding box), and isolate the scenes unique identifier.
To do this:
#code is from lab05.R file
bbwgs = bb %>% st_transform(4326)
bb = st_bbox(bbwgs)
osm = osmdata::opq(bbwgs) %>%
#or building
add_osm_feature("natural") %>%
osmdata_sf()
#mapview(osm$osm_polygons)
scenes = lsat_scenes()
down = scenes %>%
filter(min_lat <= bb$ymin, max_lat >= bb$ymax,
min_lon <= bb$xmin, max_lon >= bb$xmax,
as.Date(acquisitionDate) == as.Date("2016-09-26"))
#write.csv(down, file = "data/palo-flood-scene.csv")
We need to download and cache the data on our computers.
Caching essentially means we will download the data to a standardized location known to the downloading utility.
Before any data is downloaded, the utility will check if it already exisits. If it does, then the path to the file is returned, if it does not, the data is downloaded.
The getlandsat package provides a nice caching system. To download and cache image data we need to do the following:
We can do this using grepl like last week! However we will expand on pattern matching techniques using multiple patterns and a constraint:
meta = read_csv("../data/palo-flood-scene.csv")
files = lsat_scene_files(meta$download_url) %>%
filter(grepl(paste0("B", 1:6, ".TIF$", collapse = "|"), file)) %>%
arrange(file) %>%
pull(file)
Now that we have the URLs of the files we want, we need to download them to our cache location, using getlandsat
lsat_image will take a file name - like those your created in step 2 - and download the file to your cache. If the download has already occurred, then the path to the cached image is returned without re downloading the data.
Right now we have 6 files we want, 1 for each band. So we want to apply lsat_image, over this vector of files to return a set of local file paths. For this we can use the apply family in base R.
lapply returns a list of elements resulting from a specified function (FUN) applied to all inputs.
sapply is a wrapper of lapply that returns a vector, matrix or, if simplify = “array”, an array rather then a list
vapply is similar to sapply, but has a pre-specified type of return value, so it can be safer (and sometimes faster) to use.
We want to apply the lsat_image function over our URL paths to return a vector of local file paths … so we can use sapply where the files are the input and the FUN = lsat_image
st = sapply(files, lsat_image)
s = stack(st) %>% setNames(c(paste0("band", 1:6)))
s
## class : RasterStack
## dimensions : 7811, 7681, 59996291, 6 (nrow, ncol, ncell, nlayers)
## resolution : 30, 30 (x, y)
## extent : 518085, 748515, 4506585, 4740915 (xmin, xmax, ymin, ymax)
## crs : +proj=utm +zone=15 +datum=WGS84 +units=m +no_defs
## names : band1, band2, band3, band4, band5, band6
## min values : 0, 0, 0, 0, 0, 0
## max values : 65535, 65535, 65535, 65535, 65535, 65535
We only want to analyze our image for the regions surrounding Palo (our AOI). We will transform our AOI to the CRS of the landsat stack and use it to crop the raster stack.
cropper = bbwgs %>%
st_transform(crs(s))
r = crop(s, cropper)
r
## class : RasterBrick
## dimensions : 340, 346, 117640, 6 (nrow, ncol, ncell, nlayers)
## resolution : 30, 30 (x, y)
## extent : 594075, 604455, 4652535, 4662735 (xmin, xmax, ymin, ymax)
## crs : +proj=utm +zone=15 +datum=WGS84 +units=m +no_defs
## source : memory
## names : band1, band2, band3, band4, band5, band6
## min values : 8566, 7721, 6670, 6057, 5141, 4966
## max values : 18482, 19079, 19592, 21965, 24772, 23896
We have loaded them as a multiband raster object in R and cropped the domain to our AOI. Lets make a few RGB plots to see what these images reveal.
115 A-C covers the notion of spectral signatures (bands) and spectral combinations in greater detail as well as common task like atmospheric correction. We dont need to worry much about correction here since we are using a LT1 product (check the processingLevel in the metadata)
Standard cameras replicate whats seen with the human eye, by capturing light in the red, green and blue wavelengths and applying red, green ,and blue filters (channels) to generate a natural looking RGB image.
With a multispectral Landsat 8 image, we have more information to work with and different wavelengths/combinations can help isolate particular features.
For example, the Near Infrared (NIR) wavelength is commonly used to analysis vegetation health because vegetation reflects strongly in this portion of the electromagnetic spectrum. Alternatively, the Shortwave Infrared (SWIR) bands are useful for discerning what is wet and dry.
When working with Landsat imagery, a logical first step is to load an image into an image analysis program (like ENVI) to visualize whats in the scene. We can do the same thing with R using the plotRGB function and selecting which band should populate each channel.
Rename raster stack with the names of the band (e.g. “coastal”, “red”, “green”, …)
par(mfrow = c(2,2))
#plot Natural
plotRGB(r, r = 4, g = 3, b = 2, axes = F, main = "Natural")
#plot color IR
plotRGB(r,axes = F, 5, 4, 3, main = "IR")
#plot false for water
plotRGB(r, 5, 7, 1, axes = F, main = "False Water Focus")
#False for Ag.
plotRGB(r, 6, 5, 2, axes = F, main = "False Ag. Focus")
par(mfrow = c(2,2))
#stretch lin =
plotRGB(r, r = 4, g = 3, b = 2, stretch = "lin", main = "lin")
#stretch hist =
plotRGB(r, r = 4, g = 3, b = 2, stretch = "hist", main = "hist")
The purpose of applying a color sketch is to refine color contrast, in this example, I prefer the “lin” stretch because it groups the landscapes into relative types. The rural lands are more similar in color than in the “hist” stretch.
Accurate assessment of surface water features (like flooding) have been made possible by remote sensing technology. Index methods are commonly used for surface water estimation using a threshold value.
For this lab we will look at 5 unique thresholding methods for delineating surface water features from different combinations of Landsat bands.
Raster Algebra * Create 5 new rasters using the formulas for NDVI, NDWI, MNDWI, WRI and SWI * Combine those new rasters into a stacked object set the names of your new stack to useful values * Plot the new stack, using the following palette (colorRampPalette(c(“blue”, “white”, “red”))(256)) * Describe the 5 images. How are they simular and where do they deviate?
#NDVI
ndvi = (r$band5 - r$band4) / (r$band5 + r$band4)
#NDWI
ndwi = (r$band3 - r$band5) / (r$band3 + r$band5)
#MNDWI
mndwi = (r$band3 - r$band6) / (r$band3 + r$band6)
#WRI
wri = (r$band3 + r$band4) / (r$band5 + r$band6)
#SWI
swi = 1 / sqrt(r$band2 - r$band6)
#stack
clr_st = stack(ndvi, ndwi, mndwi, wri, swi) %>%
setNames(c("NDVI", "NDWI", "MNDWI", "WRI", "SWI"))
plot(clr_st)
dim(clr_st)
## [1] 340 346 5
#plot
plot(clr_st, col = colorRampPalette(c("blue", "white", "red"))(256))
Here we will extract the flood extents from each of the above rasters.
For this, we will use the calc function and apply a custom formula for each calculated field from step 1 that applies the threshold in a way that flooded cells are 1 and non-flooded cells are 0.
thresholding = function(x){ifelse(x <= 0, 1, NA)}
flood = calc(ndvi, thresholding)
plot(flood, col = "blue")
An alternative way to identify similar features in a continuous field is through supervised or unsupervised classification. Supervised classification groups values (cells) based on user supplied “truth” locations. Since flood events are fast-occurring there is rarely truth points for a live event. Instead developers rely on libraries of flood spectral signatures.
Unsupervised classification finds statistically significant groupings within the data. In these clustering algorithms, the user specifies the number of classes and the categorization is created based on the patterns in the data.
For this lab we will use a simple k-means algorithm to group raster cells with similar spectral properties.
Anytime we want to be able to produce a consistent/reproducible result from a random process in R we need to set a seed. Do so using set.seed
set.seed(50)
band_st = getValues(clr_st)
dim(band_st)
## [1] 117640 5
#Dimensions have increased in value, extraction may have expanded on pixel value
Remove NA values from your extracted data with na.omit for safety
Use the kmeans clustering algorithm from the stats package to cluster the extracted raster data to a specified number of clusters k (centers). Start with 12.
Once the kmeans algorithm runs, the output will be a list of components. One of these is cluster which provides a vector of integers from (1:k) indicating the cluster to which each cell was allocated.
Create a new raster object by copying one of the original bands.
#E = kmeans(band_st, 5, iter.max = 100) %>%
# na.omit()
#kmeans_raster = flood$ndvi
#values(kmeans_raster) = kmeans_E$cluster
Try a few different clusters (k) to see how the map changes
To identify the flood category programatically, generate a table crossing the values of one of your binary flood rasters, with the values of you kmeans_raster. To do this, you will use the table function and pass it the values from a binary flood raster, and the values from your kmeans_raster. Here the following occurs:
table builds a contingency table counting the number of times each combination of factor levels in the input vector(s) occurs. This will give us a table quantifying how many cells with a value 1 are aligned with each of the k classes, and how many cells with a value 0 are aligned with each of the k classes.
If you pass the binary flood values as the first argument to table then the unique values (0,1) will be the rows. They will always be sorted meaning you know the flooded cells will be in the second row.
which.max() returns the index of the maximum value in a vector.
combine this information to identify the cluster in the kmeans data that coincides with the most flooded cells in the binary mask.
Once you know this value, use calc to extract the flood mask in a similar way to the thresholding you did above.
Finally use addLayer to add this new layer to your flood raster stack.
Our last goal is to identify how they compare. ### 6.1 (Total Area) * we will calculate the total area of the flooded cells in each image. You can use cellStats to determine the sum of each layer. Since flooded cells have a value of 1, the sum of an entire band is equivalent to the number of flooded cells. You can then use the resolution of the cell to convert counts to a flooded area. * print values as a kable table
This kind of work goes on regularly and is part of a couple national efforts (NOAA, USGS, FirstStreet, FEMA) to generate flood inundation libraries that contribute to better extraction and classification of realtime flood events, resource allocation during events, and damage assessments post events.
Here we used landsat imagery but the same process could be implemented on drone footage, MODIS data, or other private satellite imagery.
Your evaluation was based purely on the raster data structure and your ability to conceptualize rasters as vectors of data with dimensional structure. You applied simple mathematical operators (+, /, -) to the raster bands, and a kmeans clustering algorithm to the data matrix of the multiband raster
Use mapview to generate a slippy map of the Palo, Iowa bbox. Find the location shown in the above image using context clues and different base maps. Once you do, do the following:
Create a sfc object from the latitude and longitude of the mouse coordinates at the impacted location
use the st_point constructor to create an sfg; convert it to an sfc with the appropriate lat/lng CRS; and the transform to the CRS of the flood rasters
Use raster::extract to extract the binary flood values at that location from the six layer flood map stack
How many of the maps captured the flooding at that location